1 Introduction

All through our lives, there definitely has been a coffee lover we know or we will be coffee lovers ourselves. Coffee has helped a lot of us to stay awake to complete our assignments or any other important tasks. It has also worked adversely for many of us, where we drink coffee at very late hours and find it difficult to fall asleep. Coffee was first exported from Ethiopia to Yemen in the late 15th Century “History of Coffee” (2021). This analysis will help in understanding the nature of the different coffee beans (Arabica and Robusta), the regions where it is grown, the ratings for the different types of coffee beans and also the processing methods.

2 Analysis by Panagiotis Stylianos

2.1 How many bags of coffee from each country were sampled?

We begin our analysis by providing a summary of the samples used to provide the coffee ratings. For each country different varieties of coffee were sampled from different regions and companies. The samples from each country can be summarised using the number_of_bags variable. It is important to know the total quantity for each country to possibly identify a relationship between the number of samples used and the countries rating.

The below table summarises the total bags counted for each country.

We observe that more than 40000 samples of Colombian coffee were used for grading.

Scatterplot of average coffee rating and number of testing samples

Figure 2.1: Scatterplot of average coffee rating and number of testing samples

Figure 2.1 indicates that there is not an apparent association between the average coffee rating and the number of testing samples.

2.2 Which country produces the highest rated coffee?

After we identified that the number of testing samples doesn’t influence the coffee rating, we can answer which countries produce the best quality of coffee.

Coffee Rating Distribution by Country

Figure 2.2: Coffee Rating Distribution by Country

From Figure 2.2 we notice that the distribution of Ethiopian coffee rating is highly skewed to the right indicating that the coffee quality is excellent.

3 Analysis by Yiwen Liu

In this section, I want to analyze some interesting content about Arabica coffee beans and Robusta coffee beans.

3.1 Which top3 countries cultivated most kinds of Arabica coffee beans and Robusta coffee beans respectively?

The Table 3.1 shows that Mexico, Colombia and Guatemala cultivated the most kinds of Arabica coffee beans and India, Uganda and Ecuador cultivated the most kinds of Robusta coffee beans. Also it could find that there are much more types of Arabica coffee beans compared to Robusta coffee beans, which conforms to the content given by Bunn et al. (2015).

Table 3.1: The top3 countries which cultivated most kinds of Arabica coffee beans and Robusta coffee beans respectively
country_of_origin species n
Mexico Arabica 236
Colombia Arabica 183
Guatemala Arabica 181
India Robusta 13
Uganda Robusta 10
Ecuador Robusta 2

Now, the Figure 3.1 shows the geographical location of this 6 countries. It could easily find that it seems to be an obvious coffee production zone, which is between the equator and 30 degrees north latitude. In these zones, the annual average temperature and rainfall are in line with the coffee bean growing conditions.

Besides, it indicates that the countries that cultivated Arabica coffee beans are all located in Central and South America, while the countries that cultivated Robusta coffee beans are located in several continents like South America, Eastern Africa and India. It is related to the environment required for the growth of different coffee beans.

The geographical location of the top3 countries which cultivated most kinds of Arabica coffee beans and Robusta coffee beans respectively

Figure 3.1: The geographical location of the top3 countries which cultivated most kinds of Arabica coffee beans and Robusta coffee beans respectively

3.2 What is the difference of altitude of Arabica coffee beans and Robusta coffee beans production areas?

Figure 3.2 indicates that the mean altitude of Arabica coffee beans production areas is concentrated from 1000 to 1800 meters. Besides, people could surprisingly find that there exists two peaks about the mean altitude of Robusta coffee Beans production areas, and the ranges are concentrated from 500 to 1600 meters and 2800 to 3400 meters respectively. However, the probability of the second peak is much less than the first one.

So it could say that the mean altitude of Arabica coffee beans production areas is higher than that of many Robusta coffee Beans production areas even though there are some exceptions.

The mean altitude of Arabica coffee beans and Robusta coffee beans production areas

Figure 3.2: The mean altitude of Arabica coffee beans and Robusta coffee beans production areas

3.3 In which species Arabica coffee beans or Robusta coffee beans has higher grades?

Figure 3.3 shows the scores of several primary different aspects of Arabica coffee beans and Robusta coffee beans and their total points. It is obvious that in acidity, aftertaste, aroma and flavor aspects, the median score of Robusta coffee beans is higher than that of Arabica coffee beans, which means Robusta coffee beans have a better performance than Arabica coffee beans. As to sweetness, Arabica coffee beans is much better than Robusta coffee beans. In the end, total point, which combines these primary aspects and some other aspects, shows that Arabica coffee beans has a better quality. Maybe these grades would give some help when people choose coffee beans.

Scores of several different aspects of Arabica coffee beans and Robusta coffee beans and their total points

Figure 3.3: Scores of several different aspects of Arabica coffee beans and Robusta coffee beans and their total points

4 Analysis by Sahinya Akila

4.1 Which processing method leads to better rating

Figure 4.1: Distribution of Coffee Ratings based on Processing method

In Figure 4.1 the ratings for Semi-washed/Semi-pulped and Pulped natural honey is better, as the average rating does not go below 8 for them. Pulped Natural honey process allows the coffee beans to be dried after removing the skin of the fruit when all the is still in the beans.It’s essentially a middle ground between the dry and wet processing methods. During the natural (or dry) method, the beans are dried entirely in their natural form, while the washed (or wet) process sees all of the soft fruit residue, both skin and pulp, removed before the coffee is dried Costa (2020). This can also be deduced from the graph above where the ratings for the Washed/Wet processing method has the least rating and suggests that it is not one of the best processing methods.

4.2 Which harvest year produced the best coffee?

Coffee Ratings in each harvest year

Figure 4.2: Coffee Ratings in each harvest year

It can be observed from Figure 4.2 that the data available is for the years 2010 - 2018. Among this, 2012 has the best ratings for the harvest. 2018 had the least ratings. This is due to the favorable weather conditions in almost all the countries that produce coffee Robinson (2012).

Table 4.1: Number of records for each year
harvest_year count
2012 354
2014 233
2013 181
2015 129
2016 124
2017 70
2011 26
2010 10
2018 1

As it can be seen in Table 4.1, there is only one record for 2018. Which implies that there is some missing data in the data set.

5 Conclusion

From the analysis above, we could find that there is no relationship between the number of test samples and average coffee rating. The highest number of samples is 41204 bag samples from Colombia and the least is 1 bag sample from Mauritius. The rating for Ethiopian Coffee is the highest.

Besides, we find that Arabica coffee beans are cultivated mostly by Mexico, Colombia and Guatemala (South American Region). Whereas Robusta coffee beans are cultivated mostly by India, Uganda and Ecuador. In addition, the mean altitude of Arabica coffee beans production areas is higher than that of many Robusta coffee Beans production areas even though there are some exceptions. What’s more, Arabica coffee beans has a higher median total point than Robusta coffee Beans, which means it has a better performance.

Finally, it indicates that ratings for Semi-washed/Semi-pulped and Pulped natural honey is better when compared to Washed/Wet processing method. And it tells us that 2012 has been the best year in terms of harvesting the coffee beans.

6 Acknowledgements

This report was written using R(R Core Team 2021). The following R packages were used to produce this report:tidyverse(Wickham et al. 2019), readr(Wickham and Hester 2020), kableExtra(Zhu 2021), bookdown(Xie 2020), maps(Richard A. Becker, Ray Brownrigg. Enhancements by Thomas P Minka, and Deckmyn. 2018), knitr(Xie 2021), dplyr(Wickham et al. 2021), hrbrthemes(Rudis 2020), viridis(Garnier 2018), plotly(Sievert 2020), leaflet(Cheng, Karambelkar, and Xie 2021) and ggridges(Wilke 2021).

The origin data used for analysis comes from this github website.

References

Bunn, Christian, Peter Läderach, Oriana Ovalle Rivera, and Dieter Kirschke. 2015. “A Bitter Cup: Climate Change Profile of Global Production of Arabica and Robusta Coffee.” Climatic Change 129 (1): 89–101.
Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2021. Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.
Costa, Bruna. 2020. “Coffee Processing: Understanding Pulped Natural Coffee.” Perfect Daily Grind. Perfect Daily Grind. https://perfectdailygrind.com/2016/06/coffee-processing-understanding-pulped-natural-coffee/.
Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.
“History of Coffee.” 2021. Wikipedia. Wikimedia Foundation. https://en.wikipedia.org/wiki/History_of_coffee.
R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Richard A. Becker, Original S code by, Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka, and Alex Deckmyn. 2018. Maps: Draw Geographical Maps. https://CRAN.R-project.org/package=maps.
Robinson, V. 2012. “Global Agriculture Environment.” CFW Plexus, no. AACCI 2012 Annual Meeting. https://doi.org/10.1094/cplex-2012-1118-01w.
Rudis, Bob. 2020. Hrbrthemes: Additional Themes, Theme Components and Utilities for ’Ggplot2’. https://CRAN.R-project.org/package=hrbrthemes.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Jim Hester. 2020. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wilke, Claus O. 2021. Ggridges: Ridgeline Plots in ’Ggplot2’. https://CRAN.R-project.org/package=ggridges.
Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.
———. 2021. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://CRAN.R-project.org/package=knitr.
Zhu, Hao. 2021. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.